Skip to content
STIMSMITH

Differential Testing

Technique WIKI v2 · 5/29/2026

Differential testing is a comparison-based validation technique in which an implementation under test is executed on the same testcases as one or more reference implementations, and their observable results are checked for equality. In instruction-set-simulator verification, coverage-guided fuzzing can generate instruction-stream testcases, after which the simulator under test is compared against reference ISSs using register values, selected memory contents, crashes, and other mismatches as triage signals.

Overview

Differential testing is a comparison-based testing technique: an implementation under test is executed on the same testcase as one or more reference implementations, and the resulting observable behavior is checked for equality. In the instruction-set-simulator (ISS) verification setting described by Verifying Instruction Set Simulators using Coverage-guided Fuzzing, the ISS under test is verified by comparing its execution results with those of other reference ISSs, which may include multiple references.

The compared observations can include normal execution results as well as failures. The ISS-verification workflow reports mismatches, including crashes, and checks equality over result data such as register values and selected memory content.

Workflow in ISS verification

In the cited ISS-verification workflow, differential testing is used after testcase generation:

  1. A coverage-guided fuzzer generates a testset.
  2. Each generated binary bytestream is interpreted as a sequence of instructions for the ISS under test.
  3. The bytestream is embedded into a predefined ELF template to form an ELF testcase.
  4. The ELF template provides an execution frame: prefix code initializes the ISS into a predefined initial state, including predefined register values so that all ISS implementations start from the same state; suffix code collects results and stops the simulation.
  5. During testset evaluation, the ISS under test and a reference ISS execute the testcase, and their results are checked for equality.

Input scope

The ISS-verification approach is not limited to predefined instruction subsets. The evidence describes considering all possible instructions and instruction sequences, including illegal instructions, with the intent of exercising uncommon or error cases.

Interpreting mismatches

A mismatch is a signal for investigation, not automatically proof of an implementation bug. The cited work notes that mismatches can arise from configuration differences, such as different memory sizes or peripheral mappings in the address space. For example, a load/store instruction may succeed in one ISS and fail in another because of such configuration differences; the authors state that these mismatches are not considered bugs. Therefore, reported mismatches must be analyzed to determine whether they correspond to real ISS defects.

Relationship to coverage-guided fuzzing

In the cited paper, differential testing is paired with coverage-guided fuzzing. The fuzzer builds a testset by generating instruction bytestreams and transforming them into ELF testcases; the resulting testset is then evaluated by comparing the ISS under test against reference ISS implementations.

CITATIONS

6 sources
6 citations
[1] Differential testing in the ISS-verification setting compares execution results from an ISS under test with one or more reference ISSs. Verifying Instruction Set Simulators using Coverage-guided Fuzzing
[2] The workflow checks equality of results and can report mismatches including crashes, using observations such as register values and selected memory content. Verifying Instruction Set Simulators using Coverage-guided Fuzzing
[3] Coverage-guided ISS verification first generates a testset with a fuzzer and then evaluates that testset by comparing the ISS under test with reference ISSs. Verifying Instruction Set Simulators using Coverage-guided Fuzzing
[4] Generated binary bytestreams are interpreted as instruction sequences, embedded into ELF testcases, and run with template code that initializes a shared initial state and collects results. Verifying Instruction Set Simulators using Coverage-guided Fuzzing
[5] The ISS-verification approach considers instruction sequences including illegal instructions to exercise uncommon error cases. Verifying Instruction Set Simulators using Coverage-guided Fuzzing
[6] Mismatches require analysis because configuration differences, such as memory size or peripheral mappings, can cause differing behavior without representing ISS bugs. Verifying Instruction Set Simulators using Coverage-guided Fuzzing

VERSION HISTORY

v2 · 5/29/2026 · gpt-5.5 (current)
v1 · 5/28/2026 · gpt-5.5