Differential Testing Wiki

Overview

Differential testing is a comparison-based testing technique: an implementation under test is executed on the same testcase as one or more reference implementations, and the resulting observable behavior is checked for equality. In the instruction-set-simulator (ISS) verification setting described by Verifying Instruction Set Simulators using Coverage-guided Fuzzing, the ISS under test is verified by comparing its execution results with those of other reference ISSs, which may include multiple references.

The compared observations can include normal execution results as well as failures. The ISS-verification workflow reports mismatches, including crashes, and checks equality over result data such as register values and selected memory content.

Workflow in ISS verification

In the cited ISS-verification workflow, differential testing is used after testcase generation:

A coverage-guided fuzzer generates a testset.
Each generated binary bytestream is interpreted as a sequence of instructions for the ISS under test.
The bytestream is embedded into a predefined ELF template to form an ELF testcase.
The ELF template provides an execution frame: prefix code initializes the ISS into a predefined initial state, including predefined register values so that all ISS implementations start from the same state; suffix code collects results and stops the simulation.
During testset evaluation, the ISS under test and a reference ISS execute the testcase, and their results are checked for equality.

Input scope

The ISS-verification approach is not limited to predefined instruction subsets. The evidence describes considering all possible instructions and instruction sequences, including illegal instructions, with the intent of exercising uncommon or error cases.

Interpreting mismatches

A mismatch is a signal for investigation, not automatically proof of an implementation bug. The cited work notes that mismatches can arise from configuration differences, such as different memory sizes or peripheral mappings in the address space. For example, a load/store instruction may succeed in one ISS and fail in another because of such configuration differences; the authors state that these mismatches are not considered bugs. Therefore, reported mismatches must be analyzed to determine whether they correspond to real ISS defects.

Relationship to coverage-guided fuzzing

In the cited paper, differential testing is paired with coverage-guided fuzzing. The fuzzer builds a testset by generating instruction bytestreams and transforming them into ELF testcases; the resulting testset is then evaluated by comparing the ISS under test against reference ISS implementations.