Skip to content
STIMSMITH

Coverage-guided test generation

Concept WIKI v2 · 6/5/2026

Coverage-guided test generation is a family of testing techniques that steer the production of tests or instruction streams using coverage-related feedback. The provided evidence documents its application in three domains: (1) RISC-V processor verification, where a randomized instruction-stream generator evolves at runtime based on observed coverage in tight co-simulation with an ISS, combined with Coverage-guided Aging to regularize the coverage distribution; (2) LLM-driven software testing, where a code-aware prompting strategy (SymPrompt) decomposes test generation into execution-path-aligned stages; and (3) deep learning system testing, where combinatorial-coverage criteria are adapted into a CT coverage-guided test generation technique.

Overview

Coverage-guided test generation is a class of testing techniques that use coverage-related feedback to steer the production of tests or instruction streams. The evidence covers three distinct application domains: RISC-V processor verification, LLM-driven software test generation, and deep learning system testing.

RISC-V processor verification

In the RISC-V context, the cited DATE2022 paper proposes a cross-level verification approach whose foundation is a randomized coverage-guided instruction stream generator that produces an endless, unrestricted instruction stream evolving dynamically at runtime based on observed coverage information. The approach leverages an Instruction Set Simulator (ISS) as a reference model in a tight co-simulation setting, with the ISS and RTL core compiled into a single binary that communicates in-memory. [C1]

Coverage information is continuously updated based on the execution state of the ISS, and the novel concept of Coverage-guided Aging is employed to smooth out the coverage distribution of the randomized instruction stream over time. In combination, the approach enables a broad and deep coverage to find intricate corner-case bugs in the RTL core. [C1]

Architecture

The verification framework comprises an Instruction-Injector, a Coverage-Observer, a Core-Adapter, the RTL-Core, the RTL-Memory, the ISS, and the ISS-Memory. The Instruction-Injector feeds instructions into both the RTL core and the ISS, while the Coverage-Observer tracks execution-state information used to drive the coverage-guided evolution of the instruction stream. [C1]

Experimental evaluation

Experiments are performed on the 32-bit pipelined RISC-V core of the MINRES The Good Core (TGC) series. The reported outcome is a much more regular coverage distribution of the randomized instruction stream, which the paper attributes to the combined effect of runtime coverage feedback and Coverage-guided Aging. [C1]

Motivation and prior limitations in the RISC-V setting

The paper motivates coverage-guided generation by pointing out limitations of a prior academic approach that integrates the ISS with the RTL core in a very efficient co-simulation compiled into a single binary with in-memory communication. Although that earlier approach generates endless instruction streams and supports arbitrary combinations of load/store and CSR instructions as well as infinite loops, it does not collect or employ runtime coverage information to assess and guide the test generation process. Instead, it relies on a simple randomized test strategy, which the cited paper argues makes it very difficult to continuously achieve a broad and deep test coverage in endless instruction streams. [C1]

LLM-based coverage-guided software test generation

Outside of hardware verification, coverage-guided test generation has been applied to LLM-driven software test generation. The SymPrompt approach presents a code-aware prompting strategy that deconstructs the testsuite generation process into a multi-stage sequence, where each stage is driven by a prompt aligned with the execution paths of the method under test and exposes relevant type and dependency focal context to the model. The approach builds on the observation that LLMs can solve more complex logical problems when prompted to reason about the problem in a multi-step fashion, and it enables pretrained LLMs to generate more complete test cases without any additional training. [P1]

SymPrompt is implemented using the TreeSitter parsing framework and evaluated on a benchmark of challenging methods from open-source Python projects. Reported results include a 5x enhancement in correct test generations, a 26% relative coverage improvement for CodeGen2, and over 2x coverage improvement for GPT-4 compared to baseline prompting strategies. [P1]

Combinatorial testing for deep learning systems

In the deep learning domain, the evidence describes a coverage-guided test generation technique adapted from combinatorial testing (CT). The motivating challenge is that a DL system's runtime state space is too large to test exhaustively (treating each neuron as a runtime state), and the paper adapts the CT concept to propose a set of coverage criteria for DL systems together with a CT coverage-guided test generation technique. The reported evaluation indicates that CT provides a promising avenue for testing DL systems. [P2]

Related concepts and techniques

Two entities in the knowledge graph are directly tied to coverage-guided test generation in the evidence:

  • Instruction Injection is a Technique that implements coverage-guided test generation, exemplified in the DATE2022 architecture by the Instruction-Injector that delivers instructions into both the RTL core and the ISS under coverage-driven feedback.
  • Coverage-guided Aging is a Concept that extends coverage-guided test generation by smoothing the coverage distribution of the randomized instruction stream over time, enabling a more regular and broad coverage profile.

Evidence-bounded takeaway

Within the provided evidence, coverage-guided test generation is realized in three distinct ways: as a randomized RISC-V instruction-stream generator that evolves under runtime coverage feedback in tight co-simulation with an ISS and is regularized by Coverage-guided Aging; as an LLM code-aware prompting strategy that aligns test generation with execution paths; and as a combinatorial-testing-driven technique adapted to deep learning systems. The common thread is the use of coverage information — observed at runtime, structured by execution paths, or defined by combinatorial criteria — to steer the generation of tests or instruction streams toward broader and deeper coverage.

CITATIONS

8 sources
8 citations
[1] A randomized coverage-guided instruction stream generator produces an endless, unrestricted instruction stream that evolves dynamically at runtime based on observed coverage information, leveraging an ISS as a reference model in a tight co-simulation setting. Cross-Level Processor Verification via ...
[2] Coverage information is continuously updated based on the execution state of the ISS, and the novel concept of Coverage-guided Aging is employed to smooth out the coverage distribution of the randomized instruction stream over time, enabling a broad and deep coverage to find intricate corner-case bugs in the RTL core. Cross-Level Processor Verification via ...
[3] The verification framework comprises an Instruction-Injector, a Coverage-Observer, a Core-Adapter, the RTL-Core, the RTL-Memory, the ISS, and the ISS-Memory. Cross-Level Processor Verification via ...
[4] Experiments on the 32-bit pipelined RISC-V core of the MINRES The Good Core (TGC) series achieve a much more regular coverage distribution of the randomized instruction stream. Cross-Level Processor Verification via ...
[5] A prior ISS+RTL co-simulation approach supports arbitrary combinations of load/store and CSR instructions and infinite loops, but does not collect or employ runtime coverage information; it relies on a simple randomized test strategy, making it difficult to continuously achieve a broad and deep test coverage in endless instruction streams. Cross-Level Processor Verification via ...
[6] SymPrompt is a code-aware prompting strategy that deconstructs testsuite generation into a multi-stage sequence, each stage aligned with execution paths of the method under test, exposing relevant type and dependency focal context; it is implemented with TreeSitter and evaluated on a benchmark of challenging methods from open-source Python projects. Code-Aware Prompting: A study of Coverage Guided Test Generation in Regression Setting using LLM
[7] SymPrompt enhances correct test generations by a factor of 5, bolsters relative coverage by 26% for CodeGen2, and improves coverage by over 2x for GPT-4 compared to baseline prompting strategies. Code-Aware Prompting: A study of Coverage Guided Test Generation in Regression Setting using LLM
[8] For deep learning systems, a set of combinatorial-testing coverage criteria is proposed together with a CT coverage-guided test generation technique, addressing the large runtime state space of DL systems. Combinatorial Testing for Deep Learning Systems

VERSION HISTORY

v2 · 6/5/2026 · minimax/minimax-m3 (current)
v1 · 5/26/2026 · gpt-5.5