Coverage-Guided Fuzzing
Coverage-guided fuzzing (CGF) is a fuzz-testing technique in which coverage feedback guides the generation or selection of test inputs. The supplied public evidence describes CGF as an effective testing technique that has detected many bugs in software applications and as a technique that focuses on maximizing code coverage to reveal more bugs during fuzzing.[C1]
A central limitation is that higher coverage does not necessarily imply better fault-detection capability. The supplied mutation-testing evidence explains that triggering a bug requires not only exercising a specific program path, but also reaching interesting program states along that path.[C1]
Feedback beyond raw code coverage
Because code coverage alone can be an imperfect proxy for bug discovery, one supported line of work augments CGF with mutation testing. The paper Investigating Coverage Guided Fuzzing with Mutation Testing proposes using mutation scores as feedback so that fuzzing is guided toward detecting bugs rather than only covering code. In its evaluation, the authors use Zest as the baseline, build two modified techniques on top of it, and report improvements in both code coverage and bug detection across five benchmarks.[C1]
Coverage-guided fuzzing for LLM-based multi-agent systems
The supplied FLARE evidence extends CGF to LLM-based multi-agent systems (MAS). FLARE takes MAS source code as input, extracts specifications and behavioral spaces from agent definitions, builds test oracles, and conducts coverage-guided fuzzing to expose failures. It then analyzes execution logs to determine whether tests pass and to generate failure reports.[C2]
In the reported evaluation on 16 open-source MAS applications, FLARE achieved 96.9% inter-agent coverage and 91.1% intra-agent coverage, outperforming baselines by 9.5% and 1.0%, respectively, and uncovered 56 previously unknown MAS-specific failures.[C2]
Hardware fuzzing example: GoldenFuzz
The supplied GoldenFuzz evidence shows CGF-style coverage evaluation in a RISC-V hardware-fuzzing setting. GoldenFuzz evaluates condition coverage, line coverage, and FSM coverage across three RISC-V cores: RocketChip, BOOM, and CVA6. For Di-fuzzRTL and TheHuzz, the supplied excerpt states that the analysis is limited to condition coverage on RocketChip due to page limits.[C3]
GoldenFuzz uses the Spike simulator as a golden reference model during profiling, while the device-under-test implementations include RocketChip, BOOM, and CVA6. Its vulnerability-detection workflow combines Synopsys VCS hardware simulation traces with Spike reference traces. VCS records register updates and memory operations at instruction boundaries, while Spike produces expected register and memory states for RISC-V binaries.[C3]
GoldenFuzz identifies potential bugs or vulnerabilities by comparing VCS hardware traces against Spike execution traces. Its mismatch detector checks discrepancies in register values, memory addresses, and memory contents; any mismatch is flagged for manual confirmation by the user.[C3]
Key takeaway
The supplied evidence supports a view of CGF as a broadly applicable feedback-driven testing approach. However, it also shows that the choice of feedback metric matters: raw coverage can help exploration, but bug-finding may require additional signals such as mutation scores, agent-interaction coverage, hardware condition/line/FSM coverage, or reference-model mismatches.[C1][C2][C3]