Mutation-based Fuzzing

Mutation-based fuzzing is a fuzzing technique in which exploration is driven by mutations to test artifacts. In the provided evidence, those artifacts include RISC-V instruction test vectors in coverage-guided processor verification, counterexample candidates for reactive-system model learning, and prompt templates for LLM jailbreak testing. [Mutation-based fuzzing scope]

Relationship to coverage-guided fuzzing

Mutation-based fuzzing is closely connected to coverage-guided fuzzing in the supplied processor-verification evidence. Herdt et al. proposed using state-of-the-art coverage-guided fuzzing for instruction-set-simulator (ISS) verification, adding a functional coverage metric and a mutation procedure tailored to ISS verification. Their implementation was built on LLVM libFuzzer and evaluated on three publicly available RISC-V ISSs. The authors report that the fuzzer was effective at maximizing most coverage metrics and finding errors, including new errors in every considered ISS and one in the official RISC-V reference simulator Spike. [ISS-tailored coverage-guided mutation]

The same work characterizes fuzzing as particularly useful for triggering and checking corner cases and error cases, and as complementary to other testcase generation techniques. [Fuzzing complements testcase generation]

Processor-verification mutation examples

In cross-level processor verification using AFL, Bruns et al. describe domain-specific mutations for RISC-V test vectors. A Fast Exploration mutation prephase inserts RISC-V instructions at the beginning of test vectors with arguments fixed to source/destination register x0 and immediate 0; the example given is addi x0, x0, 0. After insertion, the fuzzer executes the new test vector and saves it only if it increases coverage, which the authors describe as a way to limit the state space and avoid state-space explosion. The prephase then uses bitflip mutation to cover possible arguments and uncover unknown instructions, repeating these steps until no new test vectors are found. [Fast Exploration mutation]

Bruns et al. also extend AFL's havoc mutation. They describe the original havoc mutation as a combination of single mutations applied at random positions, then add insertion of RISC-V instructions whose arguments are not fixed to zero and that also support compressed instructions. They additionally add a replacement variant that does not change the test-vector size. [Enhanced Havoc mutation]

In the reported fuzzing results, Enhanced AFL produced a higher mean number of unique crashes than Vanilla AFL in the table shown in the evidence: 274.43 versus 237.36, with sums of 2021 versus 1619 unique crashes. [Enhanced AFL results]

Other uses in the evidence

Model learning

In model learning for reactive software systems, mutation-based fuzzing has been compared and combined with conformance testing to obtain counterexamples in the Minimally Adequate Teacher framework. In the RERS challenge setting summarized in the evidence, the fuzzer found no additional counterexamples for the LTL problems, but for reachability problems it discovered more reachable error states than the learner and tester in some cases; the authors conclude that the approaches are orthogonal and complementary in model learning. [Model learning counterexamples]

LLM jailbreak testing

TurboFuzzLLM is described as a mutation-based fuzzing technique for efficiently finding jailbreaking templates for large language models through black-box prompt access. The summarized paper says the technique automatically generates effective jailbreaking templates, reports at least 95% attack success rates on public datasets for leading LLMs including GPT-4o and GPT-4 Turbo, generalizes to unseen harmful questions, and can help improve model defenses to prompt attacks. [TurboFuzzLLM mutation-based fuzzing]

Practical pattern

Across these examples, mutation-based fuzzing is used by choosing a mutable representation of the input space, applying mutations that are either generic or domain-specific, and retaining or evaluating mutated artifacts according to the testing objective. In coverage-guided processor verification, the feedback is coverage increase and crash or mismatch discovery; in model learning, it is counterexample discovery; in LLM jailbreak testing, it is successful generation of harmful responses from mutated templates. [Mutation-based fuzzing scope]