Skip to content
STIMSMITH

Machine Learning for Fuzzing

Technique WIKI v2 · 5/29/2026

Machine Learning for Fuzzing refers to the use of machine-learning techniques to improve fuzz testing, including input mutation, coverage improvement, bypassing validation barriers, failure-inducing model learning, and other stages of the fuzzing process. The evidence describes it as an active research direction, with survey-level evidence of performance improvements over traditional fuzzing in some studies, an SDN-focused method called FuzzSDN, and a future-work direction for coverage-guided fuzzing of instruction set simulators.

Overview

Machine Learning for Fuzzing is the integration of machine-learning techniques into fuzz testing. A 2019 systematic review characterizes this line of work as a response to challenges in traditional fuzzing, including how to mutate seed inputs, increase code coverage, and bypass verification checks. The review reports that machine learning has been used across multiple stages of fuzzing and analyzes work in terms of algorithm selection, preprocessing, datasets, evaluation metrics, and hyperparameter settings.

Reported uses and benefits

The reviewed literature describes machine learning as a way to improve the fuzzing process and its results. According to the systematic review, evaluated machine-learning models showed acceptable capability for predictive categorization in fuzzing, and comparisons between traditional fuzzing tools and machine-learning-based fuzzing tools indicated that introducing machine learning can improve fuzzing performance.

The same review also identifies limitations. In particular, it notes unbalanced training samples and the difficulty of extracting characteristics related to vulnerabilities as open problems for machine-learning-based fuzzing.

Example: machine-learning-guided fuzzing for SDNs

One concrete example in the provided evidence is FuzzSDN, a machine-learning-guided fuzzing method for software-defined networks. FuzzSDN has two stated goals: generating effective test data that leads to failures in SDN-based systems, and learning failure-inducing models that characterize the conditions under which such systems fail. In evaluations on systems controlled by two open-source SDN controllers, FuzzSDN was reported to generate at least 12 times more failures than state-of-the-art SDN fuzzing methods within the same time budget for a controller described as fairly robust to fuzzing. Its learned failure-inducing models were reported to achieve an average precision of 98% and recall of 86%.

Context in instruction set simulator verification

The technique is also mentioned in Verifying Instruction Set Simulators using Coverage-guided Fuzzing. That paper focuses on coverage-guided fuzzing for instruction set simulator verification, implemented with extensions on top of libFuzzer and evaluated on three publicly available RISC-V instruction set simulators. In that work, machine learning is not presented as an implemented component of the proposed approach. Instead, the authors cite recent interest in integrating machine-learning techniques into fuzzing and describe it as a promising future direction for their application area.

The same paper also notes related processor-level stimulus-generation work using Bayesian networks and other machine-learning techniques, placing machine learning among prior approaches for improving random generation of processor-level stimuli.

Evidence-backed scope

From the available evidence, the technique should be understood as a broad research direction rather than a single algorithm. The supported claims are that machine learning has been applied to multiple fuzzing stages, can guide or improve fuzzing in some evaluated settings, can be used to learn failure-inducing models, and remains subject to data and feature-extraction limitations.

CITATIONS

8 sources
8 citations
[1] Machine learning is introduced into fuzzing to address challenges such as seed mutation, increasing code coverage, and bypassing verification, and has been studied across multiple stages of the fuzzing process. A systematic review of fuzzing based on machine learning techniques
[2] The systematic review analyzes machine-learning-based fuzzing work by algorithm selection, preprocessing methods, datasets, evaluation metrics, and hyperparameter settings. A systematic review of fuzzing based on machine learning techniques
[3] The systematic review reports that machine learning can improve fuzzing performance, while also noting limitations such as unbalanced training samples and difficulty extracting vulnerability-related characteristics. A systematic review of fuzzing based on machine learning techniques
[4] FuzzSDN is a machine-learning-guided fuzzing method for SDN-based systems that aims to generate failure-inducing test data and learn failure-inducing models. Learning Failure-Inducing Models for Testing Software-Defined Networks
[5] FuzzSDN was evaluated on systems controlled by two open-source SDN controllers and was reported to generate at least 12 times more failures than state-of-the-art methods in one robust-controller setting, with learned models averaging 98% precision and 86% recall. Learning Failure-Inducing Models for Testing Software-Defined Networks
[6] In instruction set simulator verification, Herdt et al. implemented a coverage-guided fuzzing approach with extensions on top of libFuzzer and evaluated it on three publicly available RISC-V instruction set simulators. Verifying Instruction Set Simulators using Coverage-guided Fuzzing
[7] The instruction set simulator verification paper treats integrating machine-learning techniques into fuzzing as promising future work rather than as part of its implemented approach. Verifying Instruction Set Simulators using Coverage-guided Fuzzing
[8] The instruction set simulator verification paper cites Bayesian networks and other machine-learning techniques as prior approaches for improving random generation of processor-level stimuli. Verifying Instruction Set Simulators using Coverage-guided Fuzzing

VERSION HISTORY

v2 · 5/29/2026 · gpt-5.5 (current)
v1 · 5/28/2026 · gpt-5.5