differential testing

Differential Testing

Technique

Differential testing is a comparison-based validation technique in which an implementation under test is executed on the same testcases as one or more reference implementations, and their observable results are checked for equality. In instruction-set-simulator verification, coverage-guided fuzzing can generate instruction-stream testcases, after which the simulator under test is compared against reference ISSs using register values, selected memory contents, crashes, and other mismatches as triage signals.

First seen 5/28/2026

Last seen 7/20/2026

Evidence 39 chunks

Wiki v2

WIKI

Overview

Differential testing is a comparison-based testing technique: an implementation under test is executed on the same testcase as one or more reference implementations, and the resulting observable behavior is checked for equality. In the instruction-set-simulator (ISS) verification setting described by Verifying Instruction Set Simulators using Coverage-guided Fuzzing, the ISS under test is verified by comparing its execution results with those of other reference ISSs, which may include multiple references.

The compared observations can include normal execution results as well as failures. The ISS-verification workflow reports mismatches, including crashes, and checks equality over result data such as register values and selected memory content.

READ FULL ARTICLE →

NEIGHBORHOOD

2 nodes · 2 edges

graph · differential testing · depth=1

RELATIONSHIPS

29 connections

ProcessorFuzz ← implements 97% 7e

ProcessorFuzz uses differential testing with ISA simulation as a reference.

DiFuzzRTL ← implements 98% 6e

DiFuzzRTL uses differential testing with an ISA simulation as a golden reference.

APSR status register uses → 100% 2e

The differential testing engine includes the APSR status register as part of the CPU state for comparison.

ISA Simulation uses → 100% 2e

Differential testing uses ISA simulation as a reference model to compare against RTL simulation.

RTL simulation uses → 100% 2e

Differential testing compares RTL simulation output against ISA simulation output.

StimulusRL ← implements 95% 2e

StimulusRL uses differential testing via bug oracles to detect design defects.

DiFuzzRTL ← uses 90% 2e

DifuzzRTL exemplifies differential testing applied to RTL fuzzing.

HiFuzz ← uses 85% 2e

HiFuzz uses differential testing against Spike as the reference model to detect bugs.

SiliFuzz ← implements 95% 2e

SiliFuzz cross-validates different CPU cores, which is a form of differential testing.

SiliFuzz: Fuzzing CPUs by Proxy ← mentions 95% 2e

The paper identifies SiliFuzz as a kind of differential testing that cross-validates CPU cores.

Mu2 ← implements 100% 2e

Mu2 uses differential testing as its oracle for mutation testing within the fuzzing loop.

Mu2 ← uses 100% 2e

Differential testing is used by Mu2 as the oracle for determining mutant killing in the fuzzing loop.

hardware fuzzing ← uses 90% 2e

Hardware fuzzing uses differential testing to compare RTL and ISA simulator outputs.

DiffSpec ← implements 99% 2e

DiffSpec is a framework that realizes differential testing using LLMs and prompt chaining.

Examiner ← uses 100% 2e

Examiner uses differential testing to compare instruction execution between emulators and real devices.

CPU state comparison uses → 100% 2e

Differential testing uses CPU state comparison to identify inconsistent instructions between emulators and real devices.

Python DV evaluation harness ← implements 90% 1e

The Python harness implements differential testing by comparing golden model and buggy variant outputs.

Instruction Set Simulator uses → 100% 1e

Differential testing uses an ISS as a reference for comparison.

ISA simulation uses → 95% 1e

Differential testing in processor fuzzing compares ISA simulation results against RTL simulation results.

SpecDoctor ← uses 100% 1e

SpecDoctor utilizes differential testing to detect sensitive data leakage.

Verifying Instruction Set Simulators using Coverage-guided Fuzzing ← uses 95% 1e

The paper uses differential testing by comparing execution results across multiple ISSs.

TurboFuzz ← implements 96% 1e

TurboFuzz uses differential testing by comparing DUT results with an ISA emulator.

Bug Oracle uses → 93% 1e

Differential testing uses a bug oracle by comparing outputs of two implementations.

DITWO ← implements 95% 1e

DITWO leveraged differential testing to uncover missed Wasm optimization opportunities.

WADIFF ← implements 97% 1e

WADIFF is described as the first differential testing framework for Wasm.

Mokav ← implements 97% 1e

Mokav is an LLM-guided differential testing technique targeting Python program versions.

GenHuzz ← implements 95% 1e

GenHuzz uses differential testing by comparing DUT outputs against the Golden Reference Model.

HARTBREAKER ← uses 85% 1e

HARTBREAKER, like other fuzzers, relies on differential testing using an ISS or reference hart.

Cascade ← uses 95% 1e

Cascade relies on differential testing for verification.

LINKED ENTITIES

1 links

Verifying Instruction Set Simulators using Coverage-guided Fuzzing USES Extracted graph relationship

CITATIONS

6 sources

6 citations — click to expand

[1] Differential testing in the ISS-verification setting compares execution results from an ISS under test with one or more reference ISSs. Verifying Instruction Set Simulators using Coverage-guided Fuzzing

[2] The workflow checks equality of results and can report mismatches including crashes, using observations such as register values and selected memory content. Verifying Instruction Set Simulators using Coverage-guided Fuzzing

[3] Coverage-guided ISS verification first generates a testset with a fuzzer and then evaluates that testset by comparing the ISS under test with reference ISSs. Verifying Instruction Set Simulators using Coverage-guided Fuzzing

[4] Generated binary bytestreams are interpreted as instruction sequences, embedded into ELF testcases, and run with template code that initializes a shared initial state and collects results. Verifying Instruction Set Simulators using Coverage-guided Fuzzing

[5] The ISS-verification approach considers instruction sequences including illegal instructions to exercise uncommon error cases. Verifying Instruction Set Simulators using Coverage-guided Fuzzing

[6] Mismatches require analysis because configuration differences, such as memory size or peripheral mappings, can cause differing behavior without representing ISS bugs. Verifying Instruction Set Simulators using Coverage-guided Fuzzing