Coverage-guided test generation Wiki

Overview

Coverage-guided test generation is a class of testing techniques that use coverage-related feedback to steer the production of tests or instruction streams. The evidence covers three distinct application domains: RISC-V processor verification, LLM-driven software test generation, and deep learning system testing.

RISC-V processor verification

In the RISC-V context, the cited DATE2022 paper proposes a cross-level verification approach whose foundation is a randomized coverage-guided instruction stream generator that produces an endless, unrestricted instruction stream evolving dynamically at runtime based on observed coverage information. The approach leverages an Instruction Set Simulator (ISS) as a reference model in a tight co-simulation setting, with the ISS and RTL core compiled into a single binary that communicates in-memory. [C1]

Coverage information is continuously updated based on the execution state of the ISS, and the novel concept of Coverage-guided Aging is employed to smooth out the coverage distribution of the randomized instruction stream over time. In combination, the approach enables a broad and deep coverage to find intricate corner-case bugs in the RTL core. [C1]

Architecture

The verification framework comprises an Instruction-Injector, a Coverage-Observer, a Core-Adapter, the RTL-Core, the RTL-Memory, the ISS, and the ISS-Memory. The Instruction-Injector feeds instructions into both the RTL core and the ISS, while the Coverage-Observer tracks execution-state information used to drive the coverage-guided evolution of the instruction stream. [C1]

Experimental evaluation

Experiments are performed on the 32-bit pipelined RISC-V core of the MINRES The Good Core (TGC) series. The reported outcome is a much more regular coverage distribution of the randomized instruction stream, which the paper attributes to the combined effect of runtime coverage feedback and Coverage-guided Aging. [C1]

Motivation and prior limitations in the RISC-V setting

The paper motivates coverage-guided generation by pointing out limitations of a prior academic approach that integrates the ISS with the RTL core in a very efficient co-simulation compiled into a single binary with in-memory communication. Although that earlier approach generates endless instruction streams and supports arbitrary combinations of load/store and CSR instructions as well as infinite loops, it does not collect or employ runtime coverage information to assess and guide the test generation process. Instead, it relies on a simple randomized test strategy, which the cited paper argues makes it very difficult to continuously achieve a broad and deep test coverage in endless instruction streams. [C1]

LLM-based coverage-guided software test generation

Outside of hardware verification, coverage-guided test generation has been applied to LLM-driven software test generation. The SymPrompt approach presents a code-aware prompting strategy that deconstructs the testsuite generation process into a multi-stage sequence, where each stage is driven by a prompt aligned with the execution paths of the method under test and exposes relevant type and dependency focal context to the model. The approach builds on the observation that LLMs can solve more complex logical problems when prompted to reason about the problem in a multi-step fashion, and it enables pretrained LLMs to generate more complete test cases without any additional training. [P1]

SymPrompt is implemented using the TreeSitter parsing framework and evaluated on a benchmark of challenging methods from open-source Python projects. Reported results include a 5x enhancement in correct test generations, a 26% relative coverage improvement for CodeGen2, and over 2x coverage improvement for GPT-4 compared to baseline prompting strategies. [P1]

Combinatorial testing for deep learning systems

In the deep learning domain, the evidence describes a coverage-guided test generation technique adapted from combinatorial testing (CT). The motivating challenge is that a DL system's runtime state space is too large to test exhaustively (treating each neuron as a runtime state), and the paper adapts the CT concept to propose a set of coverage criteria for DL systems together with a CT coverage-guided test generation technique. The reported evaluation indicates that CT provides a promising avenue for testing DL systems. [P2]

Related concepts and techniques

Two entities in the knowledge graph are directly tied to coverage-guided test generation in the evidence:

Instruction Injection is a Technique that implements coverage-guided test generation, exemplified in the DATE2022 architecture by the Instruction-Injector that delivers instructions into both the RTL core and the ISS under coverage-driven feedback.
Coverage-guided Aging is a Concept that extends coverage-guided test generation by smoothing the coverage distribution of the randomized instruction stream over time, enabling a more regular and broad coverage profile.

Evidence-bounded takeaway

Within the provided evidence, coverage-guided test generation is realized in three distinct ways: as a randomized RISC-V instruction-stream generator that evolves under runtime coverage feedback in tight co-simulation with an ISS and is regularized by Coverage-guided Aging; as an LLM code-aware prompting strategy that aligns test generation with execution paths; and as a combinatorial-testing-driven technique adapted to deep learning systems. The common thread is the use of coverage information — observed at runtime, structured by execution paths, or defined by combinatorial criteria — to steer the generation of tests or instruction streams toward broader and deeper coverage.