Coverage-based Greybox Fuzzing Wiki

Overview

Coverage-based greybox fuzzing (CGF) is a fuzzing methodology that uses runtime coverage information as feedback for generating and selecting test inputs. Public literature describes it as a dominant methodology for vulnerability discovery, widely applied to application software as well as system software such as kernels and firmware. It is also the basis for specialized variants such as grammar-aware fuzzing for structured inputs, multi-target fuzzing across cooperating software components, and hardware fuzzing of RTL designs.

Core feedback loop

The defining feature of CGF is that execution feedback influences the fuzzer's search. Inputs that exercise new or otherwise interesting coverage are retained or prioritized for further mutation. The exact definition of "coverage" is target-dependent: software fuzzers commonly use code-coverage signals, while hardware-oriented adaptations may use state or register-derived coverage signals.

Software adaptations

AFL is cited in the Superion work as a successful coverage-based greybox fuzzer for relatively simple test inputs. Superion extends this style of fuzzing for structured inputs such as XML and JavaScript by adding grammar-aware trimming and mutation. Its approach uses an input grammar, parses test inputs into abstract syntax trees, trims at the tree level, and mutates via enhanced dictionary-based mutation and subtree replacement. In the reported evaluation on libplist and JavaScript engines, Superion improved line and function coverage over AFL and jsfunfuzz and found new bugs and vulnerabilities.

Multi-target coverage-based greybox fuzzing extends the feedback idea across cooperating software components. The MTCFuzz public abstract describes a setting where an operating system and firmware operate cooperatively through interfaces such as OpenSBI on RISC-V or OP-TEE on ARM. Instead of measuring only a single target, the proposed method uses coverage from each cooperating component as feedback and runs the system in QEMU so coverage can be measured across software boundaries.

Hardware and processor fuzzing

CGF has also been adapted to hardware fuzzing. DIFUZZRTL adapts coverage-guided fuzzing to RTL simulation to capture finite-state-machine state transitions. Its register-coverage strategy first performs static analysis to identify a small set of registers in each RTL module and instruments the RTL with logic to record coverage during simulation. At a high level, it monitors registers whose values are directly or indirectly used to control multiplexer selection signals, builds a circuit graph of the RTL design, performs backward data-flow analysis from each multiplexer selection signal, and then hashes the identified register values into a coverage map on each clock cycle.

This hardware use case also illustrates a key CGF design issue: coverage signals must be meaningful for the search objective. The ProcessorFuzz paper reports that DIFUZZRTL's register coverage can be inflated by datapath-related registers, such as a remainder register in a MulDiv module, even when those registers provide little information about the current hardware control state. ProcessorFuzz addresses this by introducing a CSR-transition coverage metric intended to guide processor fuzzing toward interesting processor states.

Processor fuzzers also often need an oracle for semantic correctness rather than a simple crash signal. The ProcessorFuzz paper explains that semantic bugs are harder to detect than memory-safety violations because the violation condition is domain-specific. Processor fuzzers therefore use differential testing: the same input is provided to the RTL simulator and to a reference ISA simulator, and inconsistent final architectural state can indicate a potential processor bug.

Practical considerations

CGF effectiveness depends heavily on the feedback metric. A metric that is too coarse may fail to distinguish useful behaviors, while a metric that is too noisy may retain inputs that expand the search space without moving toward meaningful states. The DIFUZZRTL and ProcessorFuzz comparison highlights this tradeoff in hardware fuzzing: register-value changes can produce coverage growth, but datapath-heavy signals may not correspond to control-state exploration.

Performance also matters because fuzzers evaluate many candidate inputs. ProcessorFuzz proposes using an ISA simulator as part of the coverage-feedback mechanism to identify interesting test inputs more rapidly, and the paper notes that ISA simulation was 79× faster than RTL simulation for the BOOM RISC-V processor in its reference point.