Test Case Shrinking Wiki

Overview

Test case shrinking is a verification technique for reducing generated instruction sequences into smaller counterexamples or minimal tests while preserving the behavior needed to demonstrate a bug. In the supplied RISC-V CPU verification evidence, shrinking appears in two closely related forms: TestRIG uses direct instruction injection to make shrinking instruction sequences, including sequences with branches, straightforward; Symbolic QED is described as generating minimal tests for verification, including post-silicon verification, using a formal model of the pipeline.

Use in TestRIG

TestRIG is a randomized RISC-V CPU testing ecosystem based on RVFI-DII instrumentation and direct instruction injection. The evidence states that instruction injection “allows straightforward shrinking of sequences with branches.” This is significant because branches can otherwise make instruction-stream reduction difficult: removing or changing instructions may alter control flow and invalidate the failing behavior.

TestRIG examples also show shrink-control annotations in reduced counterexamples. One shrunken counterexample for a CHERI cSetBoundsImmediate vulnerability contains .noshrink, .shrink, and .assert directives. The .noshrink regions preserve setup and deterministic observation state, while the shrinkable region isolates the illegal capability-bounds operation and subsequent load that demonstrates the bug. The evidence explains that .noshrink was required to initialize performance counters so that the final assertion on the L1 cache-miss counter was deterministic.

Example: shrinking a CHERI counterexample

The TestRIG paper gives a shrunken counterexample involving cSetBoundsImmediate. CHERI only allows capability bounds to be reduced, so the instruction is illegal when it attempts to enlarge bounds and should throw an exception. The shrunken sequence nevertheless demonstrated that the capability that would have been produced could be forwarded during a pipeline flush, causing a cache fill and potentially enabling side-channel attacks.

This example illustrates why shrinking is useful in hardware verification: it turns a discovered failure into a compact, interpretable counterexample that separates necessary setup from the minimal failing behavior.

Relationship to minimal-test generation

Symbolic QED is described as another approach that generates minimal tests for verification, including post-silicon verification, using a formal pipeline model. This places it near the same goal as test case shrinking: producing small tests or counterexamples that expose failures with less irrelevant instruction context.

Practical role

Within the supplied evidence, shrinking supports debugging and regression work by making failures easier to understand. TestRIG is reported to find architectural bugs, microarchitectural mistakes such as register-forwarding or pipeline-flush problems, memory mistakes such as cache bugs or memory speculation failures, and unexpected feature interactions. Shrinking helps convert those discoveries into concise counterexamples suitable for analysis.