Execution Trace Comparison Wiki

Definition

Execution trace comparison is a hardware-verification technique used in RISC-V testing where generated or injected instruction sequences are executed on more than one execution target, and the resulting execution traces are compared to identify behavioral divergence. In randomized RISC-V test frameworks, generated test programs may be executed on both a golden model and a processor under development; a RISCV-DV-style framework would typically detect divergence by comparing the execution traces.^[1]

Use in TestRIG

In TestRIG, an interactive Verification Engine, or VEngine, stimulates RISC-V implementations over RVFI-DII sockets. An RVFI-DII-compatible implementation can reset, consume instruction sequences, and report execution traces through its RVFI-DII interface.^[2]

A VEngine can drive one or more RVFI-DII-compatible implementations. It may use an internal RISC-V model, or it may drive two independent implementations and compare their RVFI traces. The TestRIG paper describes QCVEngine as an example of this approach.^[3]

QuickCheck-based comparison workflow

TestRIG's QCVEngine uses Haskell QuickCheck. It constructs a pass/fail function that receives a list of instructions, sends the instructions over two DII sockets, collects RVFI traces from those sockets, asserts that the traces match, and returns the result. QuickCheck then generates inputs in search of a failing case.^[4]

Because Direct Instruction Injection (DII) decouples the instruction stream from control flow, QCVEngine can use unmodified QuickCheck utilities to generate, compare, and shrink instruction sequences.^[5]

Relationship to trace-equivalence and shrinking

DII directly specifies the instruction sequence expected in the output trace and does not associate instructions with memory addresses. This requires custom pipeline instrumentation, but it simplifies sequence generation and shrinking because the program counter does not affect the instruction stream.^[6]

The TestRIG paper also reports replaying recorded test-suite examples, including riscv-tests and RISCV-DV, and adding full trace-equivalence checking with shrinking. The same mechanism was used to capture traces of an operating system booting on a model implementation and then aid bring-up of the operating system on other implementations, with instruction shrinking used to highlight problems.^[7]

Contrast with final-state-only checking

The TestRIG paper contrasts full trace comparison with PyH2P, noting that PyH2P compares only final memory and register state with its internal PyMTL3 model rather than performing full trace comparison.^[8]

^[1]: See citation "RISCV-DV-style divergence detection". ^[2]: See citation "TestRIG VEngine trace reporting". ^[3]: See citation "TestRIG RVFI trace comparison". ^[4]: See citation "QCVEngine two-socket trace assertion". ^[5]: See citation "DII enables QuickCheck generation comparison and shrinking". ^[6]: See citation "DII specifies expected output trace sequence". ^[7]: See citation "Recorded tests gain full trace-equivalence checking". ^[8]: See citation "PyH2P lacks full trace comparison".