Reference Model
Definition
In simulation-based functional verification, a design's actual behavior is checked by simulating the HDL implementation, driving stimuli into it, and comparing the observed behavior with the expected behavior implied by the specification. In this context, a reference model is the software or executable model that predicts how the design should behave for a given input. The reference model accepts instructions or stimuli as input and generates the expected results used to evaluate the device. The same term is also used in software-engineering discourse with a different meaning: an abstract framework for system-interaction semantics that classifies bidirectional interactions as horizontal (stateful, asynchronous, nondeterministic, described by protocols) or vertical (asymmetric, described by object models, operations, or anonymous events). The verification meaning is the one covered by the evidence in this article.
Role in a verification environment
A reference model provides the expected result stream against which the device under test (DUT) is checked. In the cited RISC-V vector processing unit (VPU) environment, a UVM scoreboard compares VPU results with results from the reference model. When an instruction completes, the scoreboard executes a comparison method; for vector instructions, the verification environment includes the destination vector register value extracted from the reference model so that it can compare the expected and observed register contents.
Reference models in hardware fuzzing
In differential hardware fuzzing, a reference model acts as a correctness oracle for randomly generated test programs. DifuzzRTL improved upon the earlier RFUZZ coverage-directed fuzzer by adding clock-sensitive optimization and incorporating a reference model, enabling better capture of state transitions and more effective checking of RTL execution results. The reference model in differential fuzzing is typically an ISA-level simulator executed in parallel with the DUT so that architectural-state mismatches can be flagged as bugs without requiring a hand-written checker for each instruction.
FPGA-accelerated concurrent execution (Lyra)
The Lyra RISC-V verification framework is a heterogeneous GPU-CPU-FPGA co-verification platform that places the reference model on the same FPGA System-on-Chip (SoC) as the DUT. Following the Encore architecture, the DUT runs on the programmable logic (PL) while a software ISA emulator serving as the reference model runs on the on-chip hardened ARM processors. Dedicated hardware checkers perform runtime differential checking of execution results at the instruction level, and register-level coverage points are instrumented directly on the FPGA so that coverage collection is no longer bottlenecked by software simulation. LyraGen, a 125-million-parameter domain-specialized generative model retrained from OPT-125M, produces the instruction streams that drive both the DUT and the reference model concurrently, with an on-FPGA encoding module translating each instruction into the model's tokenized format. This FPGA-resident reference model is what enables Lyra to report large end-to-end verification speedups over purely software-based differential fuzzers such as DifuzzRTL and Cascade.
Spike as a golden/reference model
The VPU environment used the RISC-V ISA simulator Spike for co-simulation inside the UVM environment. Spike had two main roles:
- executing scalar instructions and providing vector instructions to UVM in program order; and
- acting as the golden/reference model used to check DUT results.
To support these roles, Spike was modified with SystemVerilog Direct Programming Interface (DPI) functions, a method that resumes simulation until a vector instruction is executed and returns reference results to UVM, functions for reading Spike's memory, and a mechanism to force reduction results into Spike to avoid divergence in unordered floating-point reductions.
Scoreboard comparison flow
When Spike finds a vector instruction, it provides the instruction, reference results, and other relevant data to UVM. The instruction is packed as a transaction and sent to the issue agent, then executed by the VPU. The reference-model results are compared with the VPU-generated results. This makes the reference model a central oracle for result checking, while the scoreboard performs the actual comparison inside the UVM environment. The destination vector register value pulled from the reference model at instruction-completion time is the key datum compared against the VPU's observed value.
Handling legal model/DUT differences
A reference model may not always use the same legal algorithm as the DUT. The evidence describes this issue for unordered floating-point reductions: the VPU used a different reduction algorithm from Spike, which was allowed by the RVV specification. This caused occasional mismatches even when the VPU result was correct, and leaving the mismatched value in Spike registers could later cause additional divergence when that mismatched value was used by later instructions.
To address this, the verification team created an independent C reference model for unordered reductions. That model implemented the same exact reduction algorithm as the DUT. For those cases, the VPU result was compared against the C reduction reference model instead of Spike; if the result matched, the value was injected into Spike's register state to keep later execution aligned. This illustrates a common practical pattern: a primary reference model (an ISA simulator) may be paired with narrower, DUT-algorithm-matched reference models to resolve ambiguities permitted by the specification.
Automating reference-model construction
Reference models themselves are becoming more intricate and time-consuming to develop as integrated-circuit designs grow in complexity. ChatModel is an LLM-aided agile reference-model generation and verification platform that automates the transition from design specifications to fully functional reference models by integrating design standardization and hierarchical agile modeling. It employs a building-block generation strategy and, when evaluated on 300 designs of varying complexity, reported large efficiency and capacity gains over alternative generation methods. Such tools are increasingly relevant because the quality of the reference model directly bounds the bugs that a verification environment can detect.