Constraint Solving for Test Generation Wiki

Overview

Constraint solving for test generation uses constraints from a model specification to automatically construct tests that satisfy particular architectural conditions. In the processor-testing context described by the evidence, the goal is to generate instruction sequences that reach a desired state while avoiding undefined behavior. Prior CHERI work generated tests from a formal CHERI-MIPS ISA model written in L3, compiled from L3 to HOL4, and then used constraint solving to generate such instruction sequences. The same approach is reported as having been applied to the CHERI ARM Morello instruction set from a Sail model. ^[1]

Role in architecture-level test generation

The evidence contrasts model-derived generation with more direct randomized testing workflows. Constraint solving is useful when test generation must target specific deep states in an architectural model rather than merely sample random instruction streams. The TestRIG paper notes an expectation that future tooling could automate generation of templates targeting specific deep states in the RISC-V architectural model using constraint solving. ^[1]

Relationship to templates and generators

The evidence also associates constraint-guided generation with template-based systems. IBM’s Genesys-Pro is described as being built on templates used to intelligently solve for desired deep states. ^[2]

Relationship to TestRIG

TestRIG is presented in the evidence primarily as a randomized, model-based testing framework for RISC-V CPUs with counterexample reduction features such as smart shrinking, non-shrinkable initialization sequences, and assertions. Its paper identifies constraint solving as a future direction: a Sail-OCaml VEngine with direct access to Sail RISC-V model structures could eliminate independent encodings in the VEngine and support automated generation of templates for deep architectural states. ^[2] ^[1]

Practical significance

For hardware verification, constraint solving can complement randomized testing by steering generation toward difficult-to-reach architectural conditions. The evidence frames this as especially relevant for deep-state testing and for avoiding undefined behavior when generating instruction sequences from formal ISA models. ^[1]