Skip to content
STIMSMITH

coverage-directed test generation

Technique WIKI v1 · 5/28/2026

Coverage-directed test generation (CDG) is a test-generation technique that uses coverage feedback to steer new tests toward unexercised behavior, such as unexplored RTL regions, code branches, or model states. It is used in domains including hardware verification and robotic-software simulation, and contrasts with unguided random testing, which can repeatedly exercise the same functionality.

Overview

Coverage-directed test generation (CDG) is a test-generation technique in which coverage information is used as feedback to steer the generation or selection of additional tests. In hardware verification, the motivation is to reach uncovered RTL regions or code branches automatically rather than relying only on manual constraint tuning or unguided random tests. Public examples also describe CDG systems that use large language models to generate Verilog stimuli for unexplored code branches, and reinforcement learning to explore behavioral models using rewards based on coverage feedback.

Relationship to random testing

In processor verification, random instruction generators have been widely used because they require limited human expertise and scale to large RTL designs. However, evidence from ProcessorFuzz describes a key limitation: without coverage guidance, these tools can generate repetitive inputs that test the same processor functionality, reducing the chance of finding bugs. Engineers can manually adjust generator constraints to target uncovered RTL regions, but this increases engineering effort and slows verification.

Coverage-directed approaches address this limitation by using coverage feedback to guide generation toward less-tested behavior. In this sense, CDG can be viewed as adding a feedback loop around test generation rather than relying solely on unconstrained or weakly constrained randomness.

Common workflow

A typical CDG loop consists of:

  1. Generate a test or stimulus — for example, an instruction sequence, a Verilog input stimulus, or a simulated human/robot interaction scenario.
  2. Run the design or system under test — such as RTL, a processor model, or robotic software in simulation.
  3. Measure coverage — examples in the evidence include RTL/code coverage, software-style metrics such as statement or branch coverage, and reward signals based on coverage feedback.
  4. Use coverage gaps to guide the next test — the generator, fuzzer, learning agent, or prompting strategy is adjusted to favor unexplored behavior.
  5. Repeat until coverage or bug-finding goals are met.

Applications

Hardware and processor verification

CDG is a recognized technique in hardware verification. The ProcessorFuzz evidence contrasts traditional random instruction generators with coverage-directed mechanisms, noting that lack of coverage guidance can lead to repetitive inputs and that automatic coverage-directed mechanisms were proposed to reduce manual effort.

Coverage metric selection is important. The same evidence notes that TheHuzz uses industrial-standard tools and software-testing-style coverage metrics such as statement, branch, line, and expression coverage, while also reporting that prior work considered these metrics insufficient by themselves for processor verification. This suggests that processor-oriented CDG often needs coverage signals that are meaningful for architectural or microarchitectural behavior, not only generic code-coverage metrics.

LLM-aided Verilog test generation

The public VerilogReader work investigates integrating a large language model into CDG. In that framework, the LLM acts as a Verilog reader: it interprets code logic and generates stimuli intended to reach unexplored code branches. The authors report that their framework outperforms random testing on designs within the LLM's comprehension scope and discuss prompt-engineering optimizations to improve understanding and accuracy.

Robotic-software testing

CDG has also been applied outside hardware. A public robotics-testing study uses Belief-Desire-Intention agents as models for test generation in human-robot interaction simulations, and introduces reinforcement learning to automate exploration using a reward function based on coverage feedback. The study reports that reinforcement learning can fully automate exploration of the BDI models and produce effective coverage-directed test generation.

Benefits and limitations

Benefits

  • Reduces repeated testing of already-covered functionality compared with unguided random generation.
  • Can automate targeting of uncovered regions that would otherwise require manual constraint adjustment.
  • Can be combined with different generation engines, including random instruction generators, fuzzers, reinforcement-learning agents, and LLM-based stimulus generation.

Limitations

  • The usefulness of CDG depends strongly on the coverage metric. Generic statement, branch, line, or expression coverage may not be sufficient for processor verification by itself.
  • Some hardware fuzzing approaches that rely on translating hardware into software models gain access to software-fuzzer metrics such as basic-block and edge coverage, but introduce the additional challenge of proving equivalence between the hardware design and the software model.
  • LLM-aided CDG is limited by the model's comprehension scope, according to the VerilogReader public summary.

See also

CITATIONS

8 sources
8 citations
[1] Random instruction generators are commonly used in processor verification because they require limited human expertise and scale to large RTL designs. ProcessorFuzz: Processor Fuzzing with Control and
[2] Unguided random instruction generation can produce repetitive inputs that exercise the same processor functionality, reducing bug-finding chances. ProcessorFuzz: Processor Fuzzing with Control and
[3] Manual adjustment of constraints to target uncovered RTL regions increases engineering effort and slows verification. ProcessorFuzz: Processor Fuzzing with Control and
[4] TheHuzz uses coverage metrics such as statement, branch, line, and expression coverage, but prior work discussed in the evidence considers these metrics insufficient by themselves for processor verification. ProcessorFuzz: Processor Fuzzing with Control and
[5] Hardware-to-software-model fuzzing can use software-fuzzer coverage metrics such as basic-block and edge coverage, but introduces the challenge of proving equivalence between the hardware design and software model. ProcessorFuzz: Processor Fuzzing with Control and
[6] VerilogReader integrates an LLM into coverage-directed test generation, using it to understand Verilog code logic and generate stimuli for unexplored code branches. VerilogReader: LLM-Aided Hardware Test Generation
[7] The VerilogReader public summary reports that the framework outperforms random testing on designs within the LLM's comprehension scope and proposes prompt-engineering optimizations. VerilogReader: LLM-Aided Hardware Test Generation
[8] In robotic-software testing for human-robot interaction simulations, reinforcement learning has been used to automate BDI-model exploration with a reward function based on coverage feedback, leading to effective coverage-directed test generation. Intelligent Agent-Based Stimulation for Testing Robotic Software in Human-Robot Interactions