Machine Learning for Fuzzing Wiki

Overview

Machine Learning for Fuzzing is the integration of machine-learning techniques into fuzz testing. A 2019 systematic review characterizes this line of work as a response to challenges in traditional fuzzing, including how to mutate seed inputs, increase code coverage, and bypass verification checks. The review reports that machine learning has been used across multiple stages of fuzzing and analyzes work in terms of algorithm selection, preprocessing, datasets, evaluation metrics, and hyperparameter settings.

Reported uses and benefits

The reviewed literature describes machine learning as a way to improve the fuzzing process and its results. According to the systematic review, evaluated machine-learning models showed acceptable capability for predictive categorization in fuzzing, and comparisons between traditional fuzzing tools and machine-learning-based fuzzing tools indicated that introducing machine learning can improve fuzzing performance.

The same review also identifies limitations. In particular, it notes unbalanced training samples and the difficulty of extracting characteristics related to vulnerabilities as open problems for machine-learning-based fuzzing.

Example: machine-learning-guided fuzzing for SDNs

One concrete example in the provided evidence is FuzzSDN, a machine-learning-guided fuzzing method for software-defined networks. FuzzSDN has two stated goals: generating effective test data that leads to failures in SDN-based systems, and learning failure-inducing models that characterize the conditions under which such systems fail. In evaluations on systems controlled by two open-source SDN controllers, FuzzSDN was reported to generate at least 12 times more failures than state-of-the-art SDN fuzzing methods within the same time budget for a controller described as fairly robust to fuzzing. Its learned failure-inducing models were reported to achieve an average precision of 98% and recall of 86%.

Context in instruction set simulator verification

The technique is also mentioned in Verifying Instruction Set Simulators using Coverage-guided Fuzzing. That paper focuses on coverage-guided fuzzing for instruction set simulator verification, implemented with extensions on top of libFuzzer and evaluated on three publicly available RISC-V instruction set simulators. In that work, machine learning is not presented as an implemented component of the proposed approach. Instead, the authors cite recent interest in integrating machine-learning techniques into fuzzing and describe it as a promising future direction for their application area.

The same paper also notes related processor-level stimulus-generation work using Bayesian networks and other machine-learning techniques, placing machine learning among prior approaches for improving random generation of processor-level stimuli.

Evidence-backed scope

From the available evidence, the technique should be understood as a broad research direction rather than a single algorithm. The supported claims are that machine learning has been applied to multiple fuzzing stages, can guide or improve fuzzing in some evaluated settings, can be used to learn failure-inducing models, and remains subject to data and feature-extraction limitations.