Coverage-guided Fuzzing Wiki

Overview

Coverage-guided fuzzing (CGF) is a testing technique that uses execution feedback, commonly code coverage, to guide fuzzing toward inputs that explore more program behavior. Public research describes CGF as an effective technique that has detected many bugs across software applications and as a technique that focuses on maximizing code coverage during fuzzing. However, the same research cautions that higher coverage does not necessarily imply better fault detection: triggering a bug can require both exercising a particular path and reaching an interesting program state on that path. [CGF definition and coverage caveat]

Feedback signals

The central idea in CGF is to reward inputs that improve the feedback metric used by the fuzzer. The most common framing in the supplied public context is code-coverage maximization, but research also investigates alternative or additional feedback. One mutation-testing study proposes using mutation scores as feedback to guide fuzzing toward bug detection rather than only toward code coverage; in its evaluation on five benchmarks, the modified Zest-based techniques improved both code coverage and bug detection. [Mutation-score feedback]

Use in instruction set simulator verification

The paper Verifying Instruction Set Simulators using Coverage-guided Fuzzing is a RISC-V instruction set simulator (ISS) verification use case for CGF. Its evaluation compares three categories of tests: hand-written RISC-V ISA tests, RISC-V Torture-generated tests, and coverage-guided fuzzing. The reported coverage is measured by instrumenting the ISS under test and includes branch coverage plus several functional-coverage metrics. [ISS evaluation setup]

In the reported table, the CGF run took 32,492 seconds and achieved 100.00% branch coverage. It also reached 100.00% for the R1, R2, and R3 functional-coverage columns, 98.21% for V(RS1), 100.00% for V(RS2), 81.13% for V(RD), 100.00% for V(I imm), and 100.00% for V(I shmt). The same row reports finding all seven listed ISS-under-test errors V1 through V7, as well as S1 in Spike and H1 and H2 in Forvis. [ISS CGF results]

The same evidence shows the contrast with non-CGF test sources in that evaluation. RISC-V ISA tests ran in 2 seconds, reached 90.24% branch coverage, and found V1 through V3 in the ISS under test, while the RISC-V Torture configurations reached 74.30% branch coverage and found V1 and V2 in the ISS under test plus H2 in Forvis. [ISS comparison results]

Mutation and functional-coverage instrumentation

The ISS evidence includes a custom-mutation component: a mutation is applied to a bytestream, described as the instructions inside a testcase, and returns a modified bytestream. The same excerpt shows functional-coverage trace information being mapped into fuzzer features, for example by adding features for specific ADDI instruction relationships or operand conditions. [ISS custom mutations and feature mapping]

Limitations and extensions

The supplied evidence supports two important qualifications. First, the broader CGF literature warns that coverage is not the same as fault-detection capability, because faults can depend on program state as well as path reachability. Second, the ISS evaluation itself shows that even a strong CGF result may leave some metrics below 100%; in the reported CGF row, V(RD) is 81.13% while most other metrics are at or near 100%. [CGF definition and coverage caveat] [ISS CGF results]

CGF is also being adapted beyond conventional software targets. The FLARE preprint applies coverage-guided fuzzing to LLM-based multi-agent systems by extracting specifications and behavioral spaces from agent definitions, building test oracles, and fuzzing to expose failures. Its evaluation on 16 open-source applications reports 96.9% inter-agent coverage, 91.1% intra-agent coverage, and 56 previously unknown failures. [FLARE MAS fuzzing]