Tandem Verification Wiki

Overview

Tandem Verification is a trace-comparison approach used in the TestRIG framework for RISC-V implementations. TestRIG generates random instruction sequences, executes the same sequences on a model and an implementation under test, and compares their execution traces. The cited paper describes this as a pragmatic compromise for checking equivalence at the processor level: it does not prove equivalence, but it can demonstrate divergence and can be used throughout development. [C1]

Role in TestRIG

TestRIG is a testing framework for RISC-V implementations. Its tandem-verification workflow is motivated by the difficulty of routinely proving whole-processor equivalence, especially for full out-of-order microarchitectures. Instead of relying on a proof of equivalence, TestRIG compares observable behavior between implementations. [C1]

In the cited setup, TestRIG uses:

RISC-V Formal Interface (RVFI) to observe the change in state after each instruction of the implementation under test. [C2]
Direct Instruction Injection (DII) to provide the next instruction from the test harness, rather than fetching it from program memory according to the CPU program counter. [C2]

The paper states that the work compares executable formal models, software ISA simulators, and simulated hardware designs, rather than completed fabricated chips. [C3]

Relationship to tandem execution

The evidence uses the term tandem execution for running the same randomly generated instruction sequences on both a model and an implementation under test, then comparing their execution traces. Tandem Verification is the verification use of that trace-comparison pattern: a mismatch in the traces is treated as evidence of a divergence to investigate. [C1]

Failure discovery and reduction

When QCVEngine finds a counterexample, QuickCheck list shrinking can remove irrelevant instructions and retest the sequence. The paper also describes “smart shrinking” that transforms sequences, such as propagating an output register to later input operands, so that a smaller counterexample can remain while eliminating irrelevant instructions. [C4]

Some sequences can be marked non-shrinkable to preserve initialization needed to expose more useful divergences, such as avoiding trivial failures caused by uninitialized floating-point registers and instead reaching exception-condition or rounding-mode divergences. [C5]

Sequences may also include assertions, such as requiring the value written by the previous instruction to be non-zero. The paper notes that assertions can fail without a divergence, so sequences with assertions do not require tandem verification to discover a failure. [C6]

Scope and limitations

Tandem Verification in this evidence is a randomized testing and trace-comparison technique, not a formal proof of processor equivalence. Its value is in demonstrating divergence between a model and an implementation, including issues in instruction semantics, pipelines, and data caches, during development. [C1]