coverage-directed test generation Wiki

Overview

Coverage-directed test generation (CDG) is a test-generation technique in which coverage information is used as feedback to steer the generation or selection of additional tests. In hardware verification, the motivation is to reach uncovered RTL regions or code branches automatically rather than relying only on manual constraint tuning or unguided random tests. Public examples also describe CDG systems that use large language models to generate Verilog stimuli for unexplored code branches, and reinforcement learning to explore behavioral models using rewards based on coverage feedback.

Relationship to random testing

In processor verification, random instruction generators have been widely used because they require limited human expertise and scale to large RTL designs. However, evidence from ProcessorFuzz describes a key limitation: without coverage guidance, these tools can generate repetitive inputs that test the same processor functionality, reducing the chance of finding bugs. Engineers can manually adjust generator constraints to target uncovered RTL regions, but this increases engineering effort and slows verification.

Coverage-directed approaches address this limitation by using coverage feedback to guide generation toward less-tested behavior. In this sense, CDG can be viewed as adding a feedback loop around test generation rather than relying solely on unconstrained or weakly constrained randomness.

Common workflow

A typical CDG loop consists of:

Generate a test or stimulus — for example, an instruction sequence, a Verilog input stimulus, or a simulated human/robot interaction scenario.
Run the design or system under test — such as RTL, a processor model, or robotic software in simulation.
Measure coverage — examples in the evidence include RTL/code coverage, software-style metrics such as statement or branch coverage, and reward signals based on coverage feedback.
Use coverage gaps to guide the next test — the generator, fuzzer, learning agent, or prompting strategy is adjusted to favor unexplored behavior.
Repeat until coverage or bug-finding goals are met.

Applications

Hardware and processor verification

CDG is a recognized technique in hardware verification. The ProcessorFuzz evidence contrasts traditional random instruction generators with coverage-directed mechanisms, noting that lack of coverage guidance can lead to repetitive inputs and that automatic coverage-directed mechanisms were proposed to reduce manual effort.

Coverage metric selection is important. The same evidence notes that TheHuzz uses industrial-standard tools and software-testing-style coverage metrics such as statement, branch, line, and expression coverage, while also reporting that prior work considered these metrics insufficient by themselves for processor verification. This suggests that processor-oriented CDG often needs coverage signals that are meaningful for architectural or microarchitectural behavior, not only generic code-coverage metrics.

LLM-aided Verilog test generation

The public VerilogReader work investigates integrating a large language model into CDG. In that framework, the LLM acts as a Verilog reader: it interprets code logic and generates stimuli intended to reach unexplored code branches. The authors report that their framework outperforms random testing on designs within the LLM's comprehension scope and discuss prompt-engineering optimizations to improve understanding and accuracy.

Robotic-software testing

CDG has also been applied outside hardware. A public robotics-testing study uses Belief-Desire-Intention agents as models for test generation in human-robot interaction simulations, and introduces reinforcement learning to automate exploration using a reward function based on coverage feedback. The study reports that reinforcement learning can fully automate exploration of the BDI models and produce effective coverage-directed test generation.

Benefits and limitations

Benefits

Reduces repeated testing of already-covered functionality compared with unguided random generation.
Can automate targeting of uncovered regions that would otherwise require manual constraint adjustment.
Can be combined with different generation engines, including random instruction generators, fuzzers, reinforcement-learning agents, and LLM-based stimulus generation.

Limitations

The usefulness of CDG depends strongly on the coverage metric. Generic statement, branch, line, or expression coverage may not be sufficient for processor verification by itself.
Some hardware fuzzing approaches that rely on translating hardware into software models gain access to software-fuzzer metrics such as basic-block and edge coverage, but introduce the additional challenge of proving equivalence between the hardware design and the software model.
LLM-aided CDG is limited by the model's comprehension scope, according to the VerilogReader public summary.